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Abstract 



Can lateral connectivity in the primary visual cortex 
account for the time dependence and intrinsic task difh- 
culty of human contour detection? To answer this ques- 
tion, we created a synthetic image set that prevents sole 
reliance on either low-level visual features or high-level 
context for the detection of target objects. Rendered 
images consist of smoothly varying, globally aligned con- 
tour fragments (amoebas) distributed among groups of 
randomly rotated fragments (clutter). The time course 
and accuracy of amoeba detection by humans was mea- 
sured using a two-alternative forced choice protocol with 
self-reported confidence and variable image presentation 
time (20-200 ms), followed by an image mask optimized 
so as to interrupt visual processing. Measured psychome- 
tric functions were well fit by sigmoidal functions with 
exponential time constants of 30-91 ms, depending on 
amoeba complexity. Key aspects of the psychophysical 
experiments were accounted for by a computational net- 
work model, in which simulated responses across retino- 
topic arrays of orientation-selective elements were mod- 
ulated by cortical association fields, represented as mul- 
tiplicative kernels computed from the differences in pair- 
wise edge statistics between target and distractor im- 
ages. Comparing the experimental and the computa- 
tional results suggests that each iteration of the lateral 
interactions takes at least 37.5 ms of cortical processing 
time. Our results provide evidence that cortical asso- 
ciation fields between orientation selective elements in 
early visual areas can account for important temporal 
and task-dependent aspects of the psychometric curves 
characterizing human contour perception, with the re- 
maining discrepancies postulated to arise from the influ- 
ence of higher cortical areas. 
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Author Summary 

Current computer vision algorithms reproducing the 
feed-forward features of the primate visual pathway still 
fall far behind the capabilities of human subjects in de- 
tecting objects in cluttered backgrounds. Here we inves- 
tigate the possibility that recurrent lateral interactions, 
long hypothesized to form cortical association fields, can 
account for the dependence of object detection accuracy 
on shape complexity and image exposure time. Corti- 
cal association fields are thought to aid object detection 
by reinforcing global image features that cannot easily 
be detected by single neurons in feed-forward models. 
Our implementation uses the spatial arrangement, rela- 
tive orientation, and continuity of putative contour el- 
ements to compute the lateral contextual support. We 
designed synthetic images that allowed us to control ob- 
ject shape and background clutter while eliminating un- 
intentional cues to the presence of an otherwise hidden 
target. In contrast, real objects can vary uncontrollably 
in shape, are camouflaged to different degrees by back- 
ground clutter, and are often associated with non-shape 
cues, making results using natural image sets difficult to 
interpret. Our computational model of cortical associa- 
tion fields matches many aspects of the time course and 
object detection accuracy of human subjects on statis- 
tically identical synthetic image sets. This implies that 
lateral interactions may selectively reinforce smooth ob- 
ject global boundaries. 



Introduction 

The perception of closed contours is fundamental to 
object recognition, as revealed by the fact that com- 
mon object categories can be rapidly detected in black 
and white line drawings in which all shading and lumi- 
nance cues have been removed [Tl. Cortical association 
fields, hypothesized to capture spatial correlations be- 
tween local image features via long-range lateral synap- 
tic interactions, provide a natural substrate for rapid 
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contour perception The link between cortical as- 
sociation fields and contour perception has been inves- 
tigated through a variety of behavioral, experimental, 
and theoretical techniques [5HS|- Psychophysical mea- 
surements reveal that the detection of implicit contours, 
defined by sequences of Gabor-like elements presented 
against randomly oriented backgrounds, becomes more 
difhcult as the local curvature increases and as the indi- 
vidual Gabor elements are spaced further apart or their 
alignment is randomly perturbed. This dependence on 
proximity and relative orientation implies that, in early 
visual areas, cortical association fields are primarily lo- 
cal and aligned along smooth trajectories [51[7J|S]. In re- 
lated studies, collinear Gabor patches have been shown to 
both increase and decrease the contrast detection thresh- 
old of a central Gabor patch in a manner that depends 
on the relative timing, orientation and spatial separation 
of the flanking elements [9HTT]. providing further psy- 
chophysical evidence that lateral influences act at early 
cortical processing stages, although the contribution of 
collinear facilitation to contour integration remains con- 
troversial [12]. In primary visual cortex (VI), electro- 
physiological recordings indicate that the responses to 
optimally oriented and positioned stimuli can be facili- 
tated by flanking stimuli placed outside the classical re- 
ceptive field center [51 [SI [TOl [13] , although these effects 
have also been ascribed to elongated central receptive 
fields [m dl] and facilitation has been attributed to in- 
creases in baseline activity [16]. Nonetheless, collinear 
facilitation is consistent with anatomical studies indi- 
cating that orientation columns are laterally connected 
to surrounding columns with similar orientation prefer- 
ence [IMS]- 

Because extensive association fields are present in the 
primary visual cortex |17H19j . lateral interactions may 
be key to discriminating smooth object boundaries at 
very fast time scales (of the order of tens of ms), as ob- 
served in numerous speed of sight psychophysical exper- 
iments [U [2OH23] . Correspondingly, theoretical models 
have proposed that VI cortical association fields can be 
described mathematically on the basis of cocircularity, 
and that relaxation dynamics based on cocircular asso- 
ciation fields can extract global contours by suppress- 
ing local variation Such models are qualitatively 
consistent with human judgments as to whether pairs of 
short line segments belong to the same or separate con- 
tours, with human judgments closely following the pair- 
wise statistics of edge segments extracted from natural 
scenes [2S]. Further, model cortical association fields, 
when used to detect implicit contours, can predict key 
aspects of human psychophysics, particularly the mea- 
sured dependence on the density of foreground elements 
relative to background elements [3 [5S] . 

In this paper, we extend the above studies by inves- 
tigating whether model cortical association fields can 
account not only for dependence of contour perception 
on intrinsic task difficulty, a relationship that has been 
previously explored [5] [55], but also for the detailed 



time course of human contour detection, an aspect that 
has heretofore not been modeled explicitly, although the 
time-dependent influence of lateral interactions has been 
determined for several theoretical models [371 HH] • In this 
work, we employ multiplicative relaxational dynamics to 
estimate the time course of contour detection from a com- 
putational model employing optimized kernels. Model re- 
sults are then compared to speed-of-sight measurements 
from human subjects performing the same contour de- 
tection task. To obtain optimized cortical association 
fields, we design lateral connectivity patterns using a 
novel method that exploits the global statistical prop- 
erties of salient contours relative to background clut- 
ter. Our procedure, which can be generalized beyond 
the present application, can be summarized as follows. 

We begin by generating a large training corpus, di- 
vided into target and distractor images, from which we 
obtain estimates of the pairwise co-occurence probabil- 
ity of oriented edges conditioned on the presence or ab- 
sence of globally salient contours. From the difference in 
these two probability distributions, we construct Object- 
Distractor Difference (ODD) kernels, which are then con- 
volved with every edge feature to obtain the lateral con- 
textual support at each location and orientation across 
the entire image. Edge features that receive substantial 
contextual support from the surrounding edges are pre- 
served, indicating they are likely to belong to a globally 
salient contour, whereas edge features receiving minimal 
contextual support are suppressed, indicating they are 
more likely to be part of the background clutter. The 
lateral contextual support is applied in a multiplicative 
fashion, so as to prevent the appearance of illusory edges, 
and the process is iterated several times, mimicking the 
exchange of information along horizontal connections in 
the primary visual cortex. Our method is thus intended 
to capture the essential computational elements of cor- 
tical association fields that are hypothesized to mediate 
the pop-out of salient contours against cluttered back- 
grounds. 

To obtain a large number of training images and to 
better isolate the role of cortical association fields linking 
low-level visual features, we employ abstract computer- 
generated shapes consisting of short, smooth contour seg- 
ments that could either be globally aligned to form wig- 
gly, nearly closed objects {amoebas), or else randomly 
rotated to provide a background of locally indistinguish- 
able contour fragments (clutter). Amoeba targets lack 
specific semantic content, presumably reducing the in- 
fiuence of high level cortical areas, such as IT. However, 
our computer-generated images would not be expected to 
eliminate the contribution to contour perception from ex- 
trastriate areas [291 - 132] . Thus, our model of lateral inter- 
actions between orientation-selective neurons is designed 
to account for just one of several cortical mechanisms 
that likely contribute to contour perception. 

Our amoeba/no- amoeba image set differs from stim- 
uli used in previous psychophysical experiments that 
employed sequences of Gabor-like elements to repre- 
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sent salient contours against randomly oriented back- 
grounds [21 H H! • An advantage of contours represented 
by random Gabor fields is that the target and distrac- 
tor Gabor elements can be distributed at approximately 
equal densities, thereby precluding the use of local den- 
sity operators as surrogates for global contour percep- 
tion [2]. However, our amoeba/no- amoeba image set 
is more akin to the natural image sets used in previ- 
ous speed-of-sight object detection tasks particu- 
larly with respect to studies employing line drawings de- 
rived from natural scenes [T]. Humans can detect closed 
contours, whether defined by aligned Gabor elements or 
by continuous line fragments, in less than 200 ms [TlEn]. 
which is shorter than the mean interval between sac- 
cadic eye movements |34| . thus mitigating the contribu- 
tion from visual search. Like Gabor defined contours, 
our amoeba/no-amoeba image set implements a pop-out 
detection task involving readily perceived target shapes 
whose complexity can be controlled parametrically. 

To benchmark the accuracy and the time course of the 
ODD kernel-based procedure applied to the amoeba/no- 
amoeba task, we compare our model results to the per- 
formance of human subjects on a 2AFC speed-of-sight 
task in which amoeba/no-amoeba images are presented 
very briefly side by side, followed by a mask designed 
to limit the time the visual system is able to process 
the sensory input [H EDl - f25] . Since it takes an estimated 
100 — 300 ms for activation to spread through the ventral 
stream of the visual cortex [21', an effective mask pre- 
sented within this time frame can potentially degrade ob- 
ject detection performance by interfering with the neural 
processing mechanisms underlying recognition |22| I35j . 
By plotting task performance as a function of the stimu- 
lus onset asynchrony (SOA)-the interval between image 
and mask presentation onsets-the resulting psychometric 
curves are hypothesized to estimate the neural processing 
time required to reach a given level of classification ac- 
curacy. Amoeba targets of low to moderate complexity 
were found to reliably pop-out against the background 
clutter, allowing subjects to achieve near perfect perfor- 
mance at SOAs less than 250 ms, even when followed by 
an optimized mask consisting of rotated versions of the 
target and distractor images [20] . Our model cortical as- 
sociation fields were able to account for the dependence 
of human performance on amoeba complexity as well 
as for aspects of the time course of contour perception 
as measured by the improvement in human performance 
with increasing SOA. Thus, we present the first network- 
level computational model to simultaneously account for 
spatial and temporal aspects of contour perception, as 
measured in human subjects performing the same con- 
tour detection task. Aspects of the experimental data 
for which our model fails to account, particularly data 
showing that human subjects require longer processing 
times to detect more complex targets, may indicate the 
possible involvement of extrastriate areas, which may be 
essential for the perception of more complex shapes. 




FIG. 1: Examples of targets and distractors from 
the amoeba/no-amoeba image set for different K. 

From top to bottom: if = 2, 4, 6, 8. Left column: Targets; 
amoeba complexity increases with increasing numbers of ra- 
dial frequencies. Clutter was constructed by randomly ro- 
tating groups of amoeba contour fragments. Right column: 
Distractors; only clutter fragments are present. 



Results 

To investigate low-level cortical mechanisms for detect- 
ing smooth, closed contours presented against cluttered 
backgrounds with statistically similar low-level features, 
we designed an amoeba/no-amoeba detection task using 
a novel set of synthetic images (Figure [T]) . Amoebas are 
radial frequency patterns [35] constructed via superposi- 
tion of periodic functions described by a discrete set of 
radial frequencies around a circle. In addition, we added 
clutter objects, or distractors, that were locally indis- 
tinguishable from targets. Both targets and distractors 
were composed of short contour fragments, thus elimi- 
nating unambiguous indicators of target presence or ab- 
sence, such as total line length, the presence of line end- 
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points, and the existence of short gaps between opposed 
hne segments. To keep the bounding contours smooth, 
only the lowest K radial frequencies were included in the 
linear superposition used to construct amoeba targets. 
To span the maximum range of contour shapes and sizes, 
the amplitude and phase of each radial frequency com- 
ponent was chosen randomly, under the restriction that 
the minimum and maximum diameters could not exceed 
lower and upper limits. When only 2 radial frequencies 
were included in the superposition, the resulting amoe- 
bas were very smooth. As more radial frequencies were 
included, the contours became more complex. Thus, 
the number of radial frequencies included in the superpo- 
sition, provided a control parameter for adjusting target 
complexity. Figure [T] shows target and distractor images 
generated using different values of K. 

Human subjects are able to infer whether a two iso- 
lated line segments extracted from a natural scene are 
from the same or from separate contours using only 
distance, direction and relative orientation of the two 
segments as cues |25[ I37j . The performance of human 
subjects is well predicted by differences in the empiri- 
cally calculated co-occurrence statistics of short line seg- 
ments drawn from either the same or from different con- 
tours. To explore the ability of cortical association fields 
to account for the perception of smooth contours, we 
developed a network-level computational model of lat- 
eral interactions between orientation-selective elements 
governed by sigmoidal (piecewise linear) input/output 
synaptic transfer functions. To model lateral inter- 
actions, we constructed "Object-Distractor Difference 
(ODD) kernels" for the amoeba/no-amoeba task by com- 
puting coactivation statistics for the responses of pairs of 
orientation-selective filter elements, compiled separately 
for target and distractor images (Figure [2]) . Because the 
amoeba/no-amoeba image set was translationally invari- 
ant and isotropic, the central filter element may without 
loss of generality be shifted and rotated to a canonical po- 
sition and orientation. Thus the canonical ODD kernel 
was defined relative to filter elements at the origin with 
orientation 7r/16 (to mitigate aliasing effects). Filter el- 
ements located away from the origin can be accounted 
for by a trivial translation. To account for filter elements 
with different orientations, separate ODD kernels were 
computed for 8 orientations then rotated to a common 
orientation and averaged to produce a canonical ODD 
kernel. The canonical kernel was then rotated in steps 
between and tt (offset by 7r/16) and then interpolated 
to Cartesian x — y axes by rounding to the nearest integer 
coordinates. 

The resulting ODD kernels were generally consistent 
with the predictions of cocircular constructions [21], ex- 
cept that support was mostly limited to line elements 
lying along low curvature contours, which follows natu- 
rally from the prevalence of low curvatures in our amoeba 
training set. 

Curiously, the largest differences in the coactivation 
statistics occur close to the center of the kernel, where 



targets and distractors are presumably most similar. 
However, even at short distances, amoeba segments are 
still more likely to be aligned than clutter elements. 
Moreover, nearby pairs occur much more frequently than 
more distant pairs, amplifying their contribution to the 
difference map. Since, by design, the individual clutter 
fragments were locally indistinguishable from the tar- 
get fragments, co-occurrence statistics of oriented frag- 
ments were necessary to solve the amoeba/no-amoeba 
task. The simplest solution, adopted here, was to fo- 
cus on pairwise co-occurrences. Notably, in some neural 
preparations, pairwise interactions have been shown to 
be sufficient to account for a large fraction of all higher- 
order correlations [38l [39] . 

At the retinal stage, target and distractor images were 
represented as 256 x 256 pixel monochromatic, binary 
line drawings. At the next stage, corresponding to an 
early cortical processing area such as VI, a set of filters 
was used to represent 8 orientations, uniformly-spaced 
and centered at each pixel, with the axes rotated slightly 
(by 7r/16) to mitigate aliasing artifacts. The bottom- 
up responses of each orientation-selective element were 
computed via linear convolution using filters composed 
of a central excitatory subunit flanked by two inhibitory 
subunits. Each subunit was an elliptical Gaussian with 
an aspect ratio of 7 : 1, consistent with the aspect ra- 
tios of VI simple cell receptive fields measured experi- 
mentally [40| and similar to values employed in previ- 
ously published models of VI responses |41j. Likewise, 
we estimate that each image pixel subtended a visual an- 
gle of approximately 0.025° (see Methods), so that each 
orientation-selective element in the model subtended a vi- 
sual angle of approximately 0.2°, consistent with physio- 
logical estimates of VI receptive field sizes at small eccen- 
tricities |42]. All subunits had the same total integrated 
strength (to within a sign), whose magnitude was ad- 
justed to yield relatively clean representations of the orig- 
inal image in terms of oriented edges. The synaptic trans- 
fer function was piecewise-linear with a minimum value 
of 0.0 and a maximum value of 1.0 and a fixed threshold 
of 0.5. A finite threshold and saturation level were essen- 
tial in order to allow non-supported contour fragments 
to be suppressed while preventing well-supported frag- 
ments from growing without bound. The precise values 
used for threshold and saturation were not critical, as 
responsiveness was controlled independently by adjust- 
ing the overall integrated strength of the bottom-up and 
lateral interaction kernels (see Methods). 

Orientation-selective responses were modulated by 4 
successive applications of the multiplicative ODD kernel. 
Lateral support was first computed via linear convolu- 
tion of the ODD kernel with the surrounding orientation- 
selective elements, out to a radius of 32 pixels. Given 
that images were approximately 7° x 7° in extent (see 
Methods), ODD kernels spanned a total visual angle of 
approximately 1.75 degrees, roughly in correspondence 
with the estimated visuotopic extent of horizontal pro- 
jections in VI fi5^. The previous activity of each cell 
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FIG. 2: ODD kernels. Top Row: For a single short line segment oriented approximately horizontally at the center (not 
drawn), the co-occurrence-based support of other edges at different relative orientations and spatial locations is depicted. Axes 
were rotated by (180°/16) from vertical to mitigate aliasing effects. The color of each edge was set proportional to its co- 
occurrence-based support. The color scale ranges from blue (negative values) to white (zero) to red (positive values). Left 
panel: Co-occurrence statistics compiled from 40, 000 target images. Center panel: Co-occurrence statistics compiled from 
40, 000 distractor images. Right panel: ODD kernel, given by the difference in co-occurrence statistics between target and 
distractor kernels. Bottom Row: Subfields extracted from the middle of the upper left quadrant (as indicated by black boxes in 
the top row figures) , shown on an expanded scale to better visualize the difference in co-occurrence statistics between target and 
distractor images. Alignment of edges in target images is mostly cocircular whereas alignment is mostly random in distractor 
images, accounting for the fine structure in the corresponding section of the ODD kernel. 



was multiplied by the current lateral support, passed 
through the piecewise-linear synaptic transfer function, 
and the process repeated for up to 4 iterations. Contour 
segments that received insufficient lateral support were 
thereby suppressed, whereas strongly supported elements 
were either enhanced or remained maximally activated. 
When applied to the amoeba/no-amoeba image set, the 
ODD kernels typically suppressed clutter relative to tar- 
get segments (Figure [Sj left column). 

When applied in a similar manner to a natural gray- 
scale image to which a hard Difference-of-Gaussians 
(DoG) filter has been applied to maximally enhance lo- 
cal contrast (see Figure [s] right column) , ODD-kernels 
tended to preserve long, smooth lines while suppressing 
local spatial detail. Although ODD kernels were trained 
on a narrow set of synthetic images, the results exhibit 
some generalization to natural images due to the overlap 
between the cocircularity statistics (see Figure [2]) of the 
synthetic image set and those of natural images. 

To quantify the ability of the model to discriminate be- 
tween amoeba/no- amoeba target and distractor images, 
we used the total activation summed over all orientation- 
selective elements after k iterations of the ODD kernel. 
A set of 2,000 target and distractor images was used 



for testing; test images were generated independently 
from the training images. Histograms of the total ac- 
tivation show increasing separability between target and 
distractor images as a function of the number of itera- 
tions (Figure |4| . To maximize the range of shapes and 
sizes spanned by our synthetic targets and distractors, 
we did not require that the number of ON retinal pixels 
be constant across images. Rather, the retinal repre- 
sentations of both target and distractor images encom- 
passed a broad range of total activity levels, although 
the two distributions strongly overlapped and there was 
no evident bias favoring one or the other. At the next 
processing stage, prior to any lateral interactions, there 
was likewise little or no bias evident in the bottom-up 
responses of the orientation-selective elements. Each it- 
eration of the multiplicative ODD kernel then caused the 
distributions of total activity for target and distractor 
images to become more separable, implying correspond- 
ing improvements in discrimination performance on the 
amoeba/no- amoeba task. 

The general principles governing the operation of 
our model cortical association fields are conceptually 
straightforward. ODD kernels, which capture differences 
in the coactivation statistics of edge segments belong- 



6 




FIG. 3: The effect of lateral interactions on exam- 
ple images. Left column: black and white amoeba-target 
image {K = 4). Right column: Gray-scale natural image 
(the standard computer vision test image "Lena") after ap- 
plying a hard Difference of Gaussians (DoG) filter to enhance 
edges. Top row: Raw retinal input. Second row: Responses of 
orientation-selective elements before any lateral interactions 
(fe = 0). To aid visualization, the activity of the maximally 
responding orientation-selective element at each pixel loca- 
tion is depicted as a gray-scale intensity. Rows 3-6: Activity 
after k = 1, 2, 3, 4 iterations of the multiplicative ODD kernel, 
as labeled. For each iteration, activity was multiplied by the 
local support, computed via linear convolution of the previ- 
ous output activity with the ODD kernel. Lateral interac- 
tions tended to support smooth contours, particularly those 
arising from amoeba segments, while suppressing clutter or 
background detail. 



ing to amoebas relative to edge segments belonging to 
the background clutter, are used to determine the lateral 
contextual support for individual edge segments in an 
image. Edge segments receiving sufficiently strong sup- 
port are preserved, indicating they are likely to be part 
of an amoeba, whereas edge segments receiving insuffi- 
cient support are suppressed, indicating they are likely 
to belong to the background clutter. 

To assess the ability of the model cortical association 
fields to account for the time course of human contour 
perception, we measured the stimulus presentation time 
required for human subjects to reach a given level of accu- 
racy on an amoeba/no-amoeba task. The psychophysical 
experiment was implemented using a speed-of-sight pro- 
tocol employing a two-alternative forced choice (2AFC) 
design, with subjects using a slider bar to indicate 
which of two images, presented side-by-side, contained 
an amoeba (Figure [s]). The distance the bar was dis- 
placed to the left or to the right was used to indicate 
confidence, see Methods. To effectively interrupt visual 
processing at a given SOA, both target and distractor 
images were replaced by an optimized mask, constructed 
by combining randomly rotated amoeba and clutter seg- 
ments |20j . Our optimized masks were designed to render 
the amoeba targets virtually invisible in the fused target- 
mask composite. 

As a measure of human performance on the 
amoeba/no- amoeba task, we constructed receiver oper- 
ating characteristic (ROC) curves [43] (Figure |6]), using 
each subject's reported confidence (slider bar location rel- 
ative to the center position) as a noisy signal for estimat- 
ing which side, either left or right, contained the target 
on a given trial. True positives corresponded to trials 
on which the subject reported the target was on the left 
(relative to threshold) and the target was actually on the 
left (relative to threshold). False positives corresponded 
to trials on which the subject reported the target was 
on the left whereas the target was actually on the right 
(relative to threshold). To construct each ROC curve, 
the confidence scale along the slider bar was divided into 
6 discrete threshold values. For each threshold value, a 
cumulative proportional true positive rate was calculated 
by considering only those trials as true positives in which 
the confidence value was above threshold. The cumu- 
lative proportional false positive rate for each threshold 
value was calculated similarly. Each threshold value thus 
contributed one point on the ROC curve, with true pos- 
itive rate plotted as the ordinate and the false positive 
rate as the abscissa. The complete set of points were con- 
nected by straight lines to guide the eye (Figure |6]), with 
a separate ROC curve computed for each combination of 
SOA and target complexity. 

ROC curves for quantifying the performance of the 
model on the amoeba/no- amoeba task were computed 
similarly, using the difference in total luminance between 
the left and right images as the raw signal for estimat- 
ing which side contained the target on a given trial. If 
the total luminance of the left image was higher than 
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FIG. 4: Histograms of total luminance in target and 
distractor images as a function of the number of it- 
erations. Red bins: Total activity histograms for all 1,000 
test target images. Blue bins; Total activity histograms for 
all 1,000 test distractor images. The degree that the two 
distributions overlap is shown as the gray shaded area, which 
provides a measure of whether total luminance can be used to 
distinguish targets from distractors. The percentage in each 
shaded area shows the approximate lower bound amount of 
overlap of the two histograms, for comparison. Top row: To- 
tal summed activity over all retinal pixels. Little, if any bias 
between target and distractor images was evident in the in- 
put black and white images as there is nearly complete over- 
lap between the distributions. Subsequent rows: Total activ- 
ity histograms summed over all orientation-selective elements. 
Second row: Bottom-up responses prior to any lateral inter- 
actions. Third - sixth rows: Total activity histograms after 1 
- 4 iterations of the multiplicative ODD kernel, respectively. 
Total summed activity became progressively more separable 
with additional iterations, as evinced by a decrease in the 
overlapping areas. 



that of the right (relative to threshold), the response of 
the model would be reported as target on the left. Ide- 
ally, after several iterations of the ODD kernel, no seg- 
ments would remain in the distractor image and only 
amoeba segments would remain in the target image; in 
practice, the total luminance served as a measure of con- 
fidence. Given the much larger number of trials (1000) 
available for assessing model performance, 100 equally 
spaced threshold values were used to calculate the cor- 
responding ROC curves. As with the ROC curves con- 
structed from the confidence values reported by the hu- 
man subjects, the ROC curves computed from the confi- 
dence values reported by the model give the cumulative 
proportional true positive rate as a function of cumula- 
tive proportional false positive rate, with the confidence 
threshold varied from zero to maximum. Graphically, 
the area under the ROC curves is given by the amount 
of overlap between the total luminance histograms (see 
figure [4]) for the target and distractor images 41] . 

ROC curves for human subjects show performance in- 
creasingly above chance, indicated by a diagonal line of 
slope 1, as a function of both increasing SOA and de- 
creasing target complexity. For amoeba targets of low 
to moderate complexity, ROC curves obtained from hu- 
man subjects were well matched to those generated by 
the model cortical association fields, consistent with the 
hypothesis that lateral interactions between orientation- 
selective neurons contribute to human contour percep- 
tion, at least for simple targets. 

The area under the ROC curve (AUG) gives the prob- 
ability that a randomly chosen target image will be cor- 
rectly classified relative to a randomly chosen distractor 
image, and thus provides a threshold-independent assess- 
ment of performance on the 2AFC task. Both the aver- 
age over human subjects and the model cortical associa- 
tion fields exhibited qualitatively similar performance on 
the 2AFC amoeba/no-amoeba task (Figure [t]). Perfor- 
mance declined as a function of increasing target com- 
plexity, both for human subjects, measured at a fixed 
SOA, and for the model, measured at a fixed number 
of iterations, implying that K was an effective control 
parameter for adjusting task difficulty. At 20 ms SOA, 
the performance of human subjects was indistinguishable 
from chance, suggesting that our optimized masks effec- 
tively prevented the development of bottom-up cortical 
responses, even for the simplest targets {K — 2). Al- 
though some studies report that line drawings are pro- 
cessed more rapidly than natural images, with above 
chance performance being observed at short SOA val- 
ues [HHnj, the fact that performance on the amoeba/no- 
amoeba task was no better than chance at a 20 ms SOA 
implies that our optimized masks effectively interrupted 
visual processing of the amoeba targets. Since the model 
used here did not include any account for the time course 
of bottom-up retinocortical dynamics, we assumed that 
the performance of human subjects at 20 ms SOA should 
be equated to model performance at iterations (prior to 
any lateral interactions) , a time frame consistent with the 
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FIG. 5: Psychophysical experiment schematic. The 

stimulus consisted of one target image and one distractor im- 
age (randomly positioned with equal probability on the left 
or right), presented simultaneously for an SOA between 20 
ms and 200 ms, followed by an optimized 100 ms mask gen- 
erated from randomly rotated groups of target and distractor 
segments. Subjects indicated which side contained the target 
object (amoeba) using a computer mouse to click along a hor- 
izontal slider bar. Clicking far to the left or right indicated 
strong confidence that the corresponding side contained the 
target; clicking close to the center indicated weak confidence. 
A narrow gap in the center forced subjects to choose between 
left and right. 



distribution of the shortest measured response latencies 
recorded in primary visual cortex |45| . 

Overall, average human performance improved as a 
function of increasing SOA in a manner analogous to 
the improvement in model performance as a function of 
the number of iterations of the ODD kernel. This cor- 
respondence was especially evident for amoebas of low 
to moderate complexity (K < 4 ). For more complex 
targets, model performance lagged well behind that of 
human subjects. Studies suggest that low and high ra- 



dial frequencies are processed by different cortical chan- 
nels ^46] . Model performance might have been improved 
by training a new set of ODD kernels specifically for tar- 
gets containing K > 6 radial frequencies, thereby utiliz- 
ing a hypothetical sub-population of orientation-selective 
neurons optimized for detecting high-curvature contours. 
Here, our model was limited to a single multiplicative 
kernel for detecting all predominately smooth contours. 

To quantify how average human performance on the 
2AFC amoeba/no-amoeba task varied with SOA, and to 
compare with the dependence of model performance on 
the number of iterations of the ODD kernel, areas un- 
der both sets of ROC curves were fit to a monotonically 
increasing function of the following sigmoidal form: 

" 1 - (1 - 2^!I)e-^(*-*o) ■ 

For human experiments, the parameter t corresponds to 
the SOA in ms. Since we expect humans to perform close 
to 100% accuracy for very long SOA, we set Foo — 1. 
Since humans perform essentially at chance (50%) for 20 
ms SOA, we set to = 20 ms. Thus A was the only free 
parameter; fits to the average human data were denoted 
by Xh', Xh has units of 1/ms. Likewise, model perfor- 
mance was fit to a curve with the same functional form, 
with t = k measuring the number of iterations; Am was 
used to denote curve fits to the model data. However, 
visual inspection of the model data suggests that its per- 
formance saturates at less than 100% accuracy even after 
an infinite number of iterations, thus we forced the sig- 
moidal curve fit to the model results to asymptote at the 
final measured value of AUG: Foo — AUCk=i{K). Since 
the model performs better than chance after only 1 iter- 
ation, we set io = 0. For both the human experiments 
and the model performance, the functional form of f{t) 
ensures that /(to) = 51 corresponding to a minimal per- 
formance equal to chance. 

We find that 1/\h and 1/Aa/ behave quite differently 
as a function of K, the number of radial frequencies used 
in amoeba generation (Figure [7|). As anticipated for a re- 
laxational process governed by a single kernel, the model 
data was well described by a single value of \m (in units 
of 1 /iterations), equal to 1.26. For the human subjects 
data, values of \h increased from 0.034 to 0.011 as a 
function of amoeba complexity, corresponding to lateral 
processing times of 29.8 to 90.6 ms, respectively. If hu- 
man performance depended on only a single set of lateral 
connections, then, at least in the linear approximation 
case, we might expect human performance to be well de- 
scribed by a single dominant time constant, represent- 
ing the dominant eigenmode of the horizontal interac- 
tions [47l |48] . Multiple time scales in the human perfor- 
mance case may emerge from any number of physiological 
mechanisms not included in the present model, including 
additional non-linearities in the action of the horizontal 
connections and/or contributions to contour perception 
from extrastriate areas. Our data do not allow us to 
make a firm distinction between these possibilities. 
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FIG. 6: ROC curves comparing human and model performance on the amoeba/no amoeba task. Top two rows: 
ROC curves averaged over four different human test subjects using reported confidence (points). The dashed diagonal fine 
in each plot indicates the curve corresponding to chance. Red, blue, green, black correspond to K — 2, 4, 6, 8, respectively. 
Bottom two rows: ROC curves for model cortical association fields computed from total activity histograms. 



However, one possible interpretation of the present re- 
sults is that the perception of simple contours is domi- 
nated by relatively fast lateral interactions placed early in 
the visual processing pathway, thereby accounting for the 
good fit between the model and experimental results for 
targets of low to moderate complexity. Building on this 
interpretation, we postulate that the perception of more 
complex contours requires more extensive, and therefore 
slower, processing mechanisms involving higher cortical 
areas, thus explaining the discrepancy between model 
and experimental performance as target complexity in- 
creases. Under the assumption that human perception of 
simple amoeba targets {K < 4) depends primarily on re- 
current lateral interactions between orientation-selective 
neurons, we can estimate the time required for each it- 
eration of the multiplicative ODD kernel. This rate is 
estimated using the K = 2 time constants from the fits: 
'^M,K=2/ ~ 37.5 ms per iteration, a value consistent 
with estimates of lateral conduction delays within the 
same cortical area fT3l . 

Having shown that the lateral interactions based on 
multiplicative ODD kernels can account for both spa- 
tial and temporal aspects human contour perception, we 



seek to identify model details that are essential to the 
performance reported here. First, we demonstrate that 
the proposed model is robust and does not require that 
the magnitude of the ODD kernel be carefully titrated 
to a precise value. Model performance on the 2AFC 
amoeba/no- amoeba task, measured by the area under 
the ROC curve (AUG) for increasing numbers of itera- 
tions (fc = 0, 1, 2, 3, 4), was plotted for different values of 
the strength of the ODD kernel, given by the total in- 
tegrated strengths of the equal and opposite target and 
distractor contributions (Figure|8|. The number of radial 
frequencies was fixed at K ^ 4. Qualitatively similar per- 
formance was obtained for ODD kernel strengths ranging 
from 300 to 400. The ODD kernel used in the present 
study, whose strength was set to 325, produced near opti- 
mal performance and also exhibited monotonic improve- 
ment with increasing numbers of iterations. That perfor- 
mance was generally insensitive to the value of the main 
free parameter in the model provides strong evidence for 
the robustness of the proposed contour detection mecha- 
nism based on multiplicative lateral interactions. 

A second aspect of the model that merits scrutiny 
is the detailed structure of the ODD kernels, which 
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FIG. 7; A comparison of human and model performance on the 2AFC amoeba/no amoeba task. Left: Average 
human performance for different SOA in milhseconds. Right; Performance of model cortical association fields for increasing 
numbers of iterations. Both panels: Accuracy, which is equivalent to area under the ROC curve, (error bars) fitted to single 
sigmoidal functions (solid lines). The four curves from top to bottom correspond to 7^ = 2, 4, 6, 8 radial frequencies. 



were trained using computer-generated images in which 
the pairwise edge statistics uniquely identifying globally 
salient contours could be calculated directly. Previous 
models of contour perception typically employed much 
simpler patterns of lateral connectivity, in which excita- 
tory interactions were either coUinear or cocircular, and 
inhibitory interactions were approximately independent 
of relative orientation p i^l^nHTPS] . To determine if 
the detailed structure of the ODD kernel was critical to 
the observed performance, we repeated the amoeba/no- 
amoeba experiment using a much simpler kernel whose 
basic form was consistent with a number of previously 
published models (see Figure |8|. Specifically, we used a 
"Bowtie" kernel in which excitatory connections fanned 
out with an opening angle of 7r/6 and the difference in 
the preferred orientations of the pre- and post-synaptic 
elements differed by no more than ±7r/6. Both exci- 
tatory and inhibitory connection strengths fell off in a 
Gaussian manner, with inhibition strength being insen- 
sitive to orientation. Although the overall accuracy of 
the Bowtie kernels was lower than that achieved by the 
ODD kernels, performance on the amoeba/no-amoeba 
tasks was qualitatively similar, particularly regarding the 
general monotonic improvement with the number of it- 
erations and the absence of a sensitive dependence on 
kernel strength. Thus, we conclude that multiplicative 
lateral interactions are able to preserve smooth closed 
contours while suppressing clutter in a manner that is 
robust to broad changes in model details. 



Discussion 

We have shown that simple models of neural activity 
in primary visual cortex, enriched with lateral associa- 
tion kernels, reproduce some of the behavioral features 
regarding the human perception of broken closed con- 
tours. Our results agree not only with the measured de- 
pendence on contour complexity but also with the tem- 
poral dependence of human perception as a function of 



SOA, suggesting that horizontal connections in VI may 
play a non-trivial and global computational role in the 
perception of closed contours on very fast timescales. 

A number of studies relate to the potential contribu- 
tion of cortical association fields to human contour per- 
ception; these encompass a range of anatomical, physi- 
ological, psychophysical, and theoretical techniques [21- 
U [THini [101 HH m UMI [SS]. in particular, a number 
of theoretical models have sought to account for human 
contour perception at the level of biologically-plausible 
neural circuits [HI [27l [Ml [IS ISltiM]. with most studies 
incorporating some form of cortical association field con- 
figured to reinforce smoothness [24j. Although biologi- 
cally plausible models of cortical association fields have 
been used to account for the dependence of contour vis- 
ibility on key parameters controlling task difficulty, such 
as smoothness, closure, and density of background clut- 
ter model cortical association fields have not been 
directly compared to the time course of human contour 
perception as a function of contour complexity. Here, 
we used cortical association fields based on ODD ker- 
nels, which were computed from differences in the pair- 
wise coactivation statistics of orientation-selective ele- 
ments arising from target as opposed to distractor im- 
ages. While we designed the kernels specifically for the 
amoeba-clutter disambiguation, we emphasize that the 
algorithm for the ODD kernel construction is completely 
general and can be used to improve detection of salient 
image features in any situation where generative mod- 
els of targets and distractors are known, or there exists 
data sets of sufficient size to characterize the contour co- 
occurrence statistics empirically for both targets and dis- 
tractors. In our experiments, ODD kernels were able to 
account for the experimentally observed variations in the 
saliency of closed contours as a function of parametric 
complexity and for the time course with which smooth 
contours are processed by cortical circuits. Crucial for 
these results was our use of a synthetic target /distractor 
data set with controllable complexity and the absence 
of top-down contextual features or local cues that might 



11 



give away target presence. 

Here, we used a semi-supervised training scheme to 
learn lateral connectivity patterns optimized for perform- 
ing the amoeba/no- amoeba task. Necessarily, we sought 
to model only a subset of the lateral interactions be- 
tween orientation-selective neurons, namely, those hori- 
zontal connections configured to reinforce smooth, closed 
contours. We did not attempt to capture the full range of 
spatial relationships between features extracted at early 
cortical processing stages [Ml |SS] . Presently, databases 
containing sufficient numbers of fully annotated and seg- 
mented natural images needed to reproduce the weeks 
(or months) of visual experience required to train the 
full complement of horizontal connections in the primary 
visual cortex do not exist. Moreover, the computational 
resources to exploit such databases, even if they did ex- 
ist, are highly non-trivial to assemble. Thus, we focused 
here on a subset of horizontal connections for which it 
was possible to construct synthetic surrogate images. At 
most, the proposed model represents a subset-and only 
a subset-of the lateral connections between orientation- 
selective cortical neurons. Moreover, even a complete set 
of such horizontal connections would, at most, represent 
but a subset of the cortical mechanisms that contribute 
to the time course and shape-dependence of contour per- 
ception. 

l.OOr 
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FIG. 8: A comparison of ODD and simpler "Bowtie" 
kernel performance on the on the 2AFC, K = 4 
amoeba/no amoeba task plotted as a function of the 
number of iterations for a range of different kernel 
strengths. Line width and marker size denote values on 
kernel strength, which was the main free parameter in the 
model. Kernel strength is a dimensionless constant. Black 
lines: ODD kernel performance. Blue lines: "Bowtie" ker- 
nel performance. Qualitative behavior was similar for both 
kernels, demonstrating that multiplicative lateral interactions 
act robustly to reinforce smooth closed contours. 

The supervised training scheme employed here might 
be related to perceptual learning phenomena, which take 
place over time scales much shorter than those typi- 
cally associated with developmental processes [55ti58j . It 
is possible that known physiological mechanisms, such 



as spike-timing-dependent plasticity (STDP), especially 
with accounts for realistic conduction delays [59|, could 
mediate a rapid refinement of lateral connections so as 
to facilitate the perception of amoeba targets. Moreover, 
physiological plasticity mechanisms might produce differ- 
ent patterns of connectivity for orientation-selective ele- 
ments representing points of low as opposed to high local 
curvature, thereby optimizing lateral interactions for con- 
tours of varying complexity. Here, we made no attempt 
to customize distinct ODD kernels for detecting contours 
of varying complexity. Instead, a single ODD kernel was 
trained using a complete set of images in which differ- 
ent numbers of radial frequency components were equally 
represented. Although we did not investigate whether, or 
to what extent, the performance of human subjects im- 
proved over the course of the amoeba/no-amoeba experi- 
ment, such investigations might shed insight into the role 
of perceptual learning in the detection of closed contours. 

The question of how lateral connectivity based on ODD 
kernels might be acquired during development was not 
addressed explicitly. In principle, coactivation statistics 
between pairs of orientation-selective neurons could be 
accumulated over time in an unsupervised manner by 
a Hebbian-like learning rule |60j . Under natural view- 
ing conditions, we expect that contour fragments consis- 
tent with smooth, closed boundaries would tend to occur 
simultaneously, whereas contour fragments inconsistent 
with object boundaries would tend occur at random tem- 
poral delays. Thus, a Hebbian-like learning rule sensitive 
to temporal correlations, such as certain mathematical 
forms of STDP-like learning rules [61] , might under nor- 
mal developmental conditions lead to connectivity pat- 
terns that reinforce smooth contours. 

Of course, human contour perception may have noth- 
ing to do with cortical association fields, or lateral in- 
teractions may play a subordinate role. Early mod- 
els showed how spatial filtering could enhance texture- 
defined contours in the absence of orientation-specific 
interactions [4 and short-range lateral interactions can 
accentuate texture-defined boundaries [STJ |62]- How- 
ever, psychophysical studies employing implicit con- 
tours J2i iTj i8j , in which foreground and background ele- 
ments are present at equal density and which lack explicit 
texture cues, appear to rule out explanations that omit 
long-range, orientation-specific interactions. An influen- 
tial class of biologically-inspired computer vision models 
achieves a degree of viewpoint-invariant object recogni- 
tion by constructing feed-forward hierarchies to extract 
progressively more complex and viewpoint invariant fea- 
tures [33j ,63J . By analogy with such models, scale- and 
position-independent representations for detecting long, 
smooth contours could in principle be constructed hier- 
archically, starting with simple edge detectors and build- 
ing up progressively longer, more complex curves using 
a "bag-of-features" approach. Presently, there appear 
to be insufficient data to decide whether human con- 
tour perception involves primarily lateral, feed-forward, 
or even top-down connections [301 1311 IM]- Hypothet- 



12 



ically, the cortical association fields used in the present 
study could have been implemented as a feed-forward ar- 
chitecture, using a hierarchy of orientation-selective neu- 
rons to link progressively more widely separated contour 
fragments. Functionally, there may not exist a clean 
distinction between lateral, feed-forward and feed-back 
topologies, with the possibility that all three types of 
connectivity contribute to human contour perception. 

To quantify the temporal dynamics underlying visual 
processing, we performed speed-of-sight psychophysical 
experiments that required subjects to detect closed con- 
tours (amoebas) spanning a range of shapes, sizes and 
positions, whose smoothness could be adjusted paramet- 
rically by varying the number of radial frequencies (with 
randomly chosen amplitudes). To better approximate 
natural viewing conditions, in which target objects usu- 
ally appear against noisy backgrounds and both fore- 
ground and background objects consist of similar low- 
level visual features, our amoeba/no-amoeba task re- 
quired amoeba targets to be distinguished from locally 
indistinguishable open contour fragments (clutter). For 
amoeba targets consisting of only a few radial frequen- 
cies {K < 4), human subjects were able to perform at 
close to 100% accuracy after seeing target/distractor im- 
age pairs for less than 200 ms, consistent with a number 
of studies showing that the recognition of unambiguous 
targets typically requires 150 — 250 ms to reach asymp- 
totic performance [2H [521 [Si] , here likely aided by the 
high intrinsic saliency of closed shapes relative to open 
shapes [7| . Because mean inter-saccade intervals are also 
in the range of 250 ms [34], speed-of-sight studies indi- 
cate that unambiguous targets in most natural images 
can be recognized in a single glance. Similarly, we found 
that closed contours of low to moderate complexity read- 
ily "pop out" against background clutter, implying that 
such radial frequency patterns are processed in parallel, 
presumably by intrinsic cortical circuitry optimized for 
automatically extracting smooth, closed contours. As 
saccadic eye movements were unlikely to play a signif- 
icant role for such brief presentations, it is unclear to 
what extent attentional mechanisms are relevant to the 
speed-of-sight amoeba/no-amoeba task. 

Our results further indicate that subjects perform no 
better than chance at SOAs shorter than approximately 
20 ms. Other studies, however, report above chance 
performance on unambiguous target detection tasks at 
similarly short SOA values [D HI [M] [S3] . The discrep- 
ancy may be attributed to the different masks employed. 
Whereas the above cited studies used masks consisting 
of either spatially filtered (e.g. 1//) noise, distractor 
images, or scrambled versions of the target image set, 
we constructed rotation masks that were optimized for 
each target/distractor image pair [20] • Our working hy- 
pothesis was that an optimized mask should completely 
obscure the target object in the target-mask composite 
image; also referred to as pattern masking. The require- 
ment that the mask completely hide the target follows 
from the assumption that at very short SOA, the tar- 



get and mask images are likely to be effectively fused 
due to the finite response time of neurons and recep- 
tors in the early visual system [5S]. For the amoeba/no- 
amoeba task, we created optimized masks by rotating the 
amoeba and clutter fragments with the goal of produc- 
ing the maximum amount of interference in the responses 
of orientation-selective cells. Presumably, maximum in- 
terference occurs when orientation-selective neurons are 
presented with randomly rotated contour fragments in 
rapid succession. Although backward masks can have 
heterogeneous effects, with performance in some cases 
showing a [/-shaped dependence on SOA [SB], for the 
masks used here performance always increased monoton- 
ically with SOA. Empirically, the fact that performance 
was no better than chance at 20 ms SOA suggests that 
our optimized masks were able to effectively interrupt 
the processing of smooth, closed contours at early corti- 
cal processing stages. Indeed, the ability to drive overall 
performance down to chance at SOA values shorter than 
20 ms could provide an operational criteria for assessing 
the degree to which a given backward pattern mask is 
able to effectively interrupt visual processing. 

The amoeba/no-amoeba task required the integration 
of information over length scales spanning viewing angles 
of approximately 3 — 4° , larger than the classical excita- 
tory receptive field size of parafoveal VI neurons. The 
amoeba/no- amoeba image set (see Figure [T]) was con- 
figured so that purely local information, such as a few 
adjoining contour fragments, would not be sufficient to 
solve the target detection problem. Rather, distinguish- 
ing amoebas from clutter required integrating global in- 
formation across multiple contour fragments. Our re- 
sults suggest that such global integration can be accom- 
plished via lateral interactions between local, orientation- 
selective filters. Although the density of target and 
clutter segments was not precisely equilibrated in our 
amoeba/no- amoeba image set, the wide range of target 
sizes and shapes spanned by our image generation algo- 
rithm makes it unlikely that the near perfect performance 
of human subjects at long SOA could have been attained 
using density cues alone [4J. Here, lateral inputs were 
used to modulate the bottom-up responses in a multi- 
plicative fashion, so that our cortical association fields 
acted primarily as gates that suppressed contour frag- 
ments that did not receive sufficiently strong contextual 
support. By preventing lateral inputs from producing ac- 
tivity unless there was already a strong bottom- up input, 
a multiplicative non-linearity prevented the activation of 
contour fragments not present in the original image. 

The phenomenon of illusory contours suggests that in 
some cases contextual effects can produce activity even in 
the absence of a direct bottom-up response [30] . The pre- 
cise form of the multiplicative interaction used here was 
adopted for algorithmic simplicity rather than for biolog- 
ical realism. We observed that including a small additive 
contribution from the lateral interactions did not funda- 
mentally affect our conclusions. This suggests that ODD 
kernels, if implemented more generally, might account for 
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the perception of illusory contours as well. However, a 
more realistic description of the underlying cellular and 
synaptic dynamics would likely be necessary to model a 
relaxation process that includes both additive and mul- 
tiplicative elements. 

Both the model and the psychophysical experiments 
employed a 2AFC design (see Figure [s]) in which the 
goal was to correctly identify which of a pair of images 
contained an amoeba target. Since each trial involved 
a forced choice between two images, the model used a 
simple classifier that labeled the image with greater to- 
tal activity as the target. For both human subjects and 
the model, the number of radial frequencies K proved to 
be a good control parameter for adjusting task difficulty 
(see Figure [7| . For targets of low to moderate complex- 
ity, both model performance (as a function of number 
of iterations) and human performance (as a function of 
increasing SOA) monotonically approached nearly per- 
fect asymptotic performance as described by a single sig- 
moidal function with a characteristic scale, representing 
either time or number of iterations, that increased with K 
(see Figure [7|. Based on comparison with human perfor- 
mance at different SOA values, each iteration of the ODD 
kernels was estimated to require approximately 37.6 ms 
of cortical processing time, consistent with measured con- 
duction delays between laterally connected cortical neu- 
rons [T3j. 

Prior to any lateral interactions, the stimulus was pro- 
jected onto a retinotopic array of orientation-selective fil- 
ter elements, providing a convenient representation for 
learning cortical association fields by computing differ- 
ences in pairwise coactivation statistics between target 
and distractor images. We found that each iteration of 
the ODD kernel increased the activity of contour frag- 
ments that were part of amoebas compared to the activ- 
ity of clutter fragments, so that after several iterations 
the mean overall activity, summed across all orientation- 
selective filter elements, was higher on average for target 
images than distractor images (see Figure|4|. Even in tri- 
als that were incorrectly classified, contour fragments be- 
longing to amoebas were typically still favored relative to 
background clutter. Because the total number of contour 
fragments varied from trial to trial, with only the average 
number of fragments being fixed across the entire image 
set, our relatively crude criterion for discriminating be- 
tween target and distractor images sometimes led to clas- 
sification errors even when amoeba fragments had been 
partially segmented from the background clutter, sim- 
ply because the distractor image initially contained more 
fragments. A more sophisticated classifier might have 
led to a closer correspondence between model and hu- 
man performance. Although performance of the present 
multiplicative model appeared to saturate after only a 
few iterations of the ODD kernel (e.g. fc < 4), it is possi- 
ble that a different implementation might have continued 
to show improvements after additional iterations. How- 
ever, the longer processing time implied by additional 
iterations suggests that other physiological mechanisms. 



particularly visual search, would likely come into play. 
Granted, there is an apparent mismatch between the 
fading of clutter elements in the model and the persis- 
tence of such elements perceptually in human subjects. 
To reconcile this apparent mismatch, it has been sug- 
gested that the initial perception of brightness might be 
driven by the initial bottom- up response of the individual 
orientation-selective feature detectors, whereas persistent 
responses across these same feature detectors might drive 
salience [28 . 

The amoeba/no-amoeba image set was designed to al- 
low for parameterized complexity (in terms of the amount 
of clutter, number of radial frequencies, etc.) while avoid- 
ing reference to exogenous world knowledge. Since the 
amoeba/no- amoeba image set was machine generated, it 
was possible to produce a very large number of train- 
ing images; 40, 000 target and 40, 000 distractor images 
at 256 X 256 pixel resolution were used to train ODD 
kernels in the present study. Many computer vision sys- 
tems employ standard image classification datasets such 
as the Caltech 101 [UT, which allows for uniform bench- 
marking and thus facilitates direct comparison between 
models. Datasets based on natural images, however, suf- 
fer from several shortcomings. First, the resolution and 
number of images are fixed when the set is created. While 
some man-made datasets, such as MNIST [M]), consist 
of tens of thousands of handwritten characters, anno- 
tated sets of natural photographs ideal for speed-of-sight 
experiments are typically limited to a few hundred im- 
ages. In contrast, humans are exposed to millions of 
natural scenes during visual development. Biologically 
motivated models that attempt to replicate human per- 
formance might require similar numbers of examples. A 
second shortcoming of natural image datasets is preva- 
lence of high-level contextual information that utilizes ex- 
ogenous world knowledge, such as the increased a priori 
likelihood of finding a car on a road, or an animal in a for- 
est. Exploiting such exogenous world knowledge posses 
a formidable challenge for existing computational models 
and, on tasks that employ natural images, may obscure 
the ability of such models to extract behaviorally mean- 
ingful information from low-level visual cues. Third, nat- 
ural image datasets typically provide limited capability 
for adjusting intrinsic task difficultly. For example, one 
widely used dataset [33 includes photographs of animals 
at different distances, but only a few discrete distances 
are annotated and the relationship of target distance to 
task difficultly is not easily quantified. Here, we illus- 
trated how a synthetic set of images could be used to 
compare model and human performance in a task with 
parametric difficulty, potentially validating the use of ar- 
tificial as opposed to natural images. 

The present study addressed the role of cortical associ- 
ation fields in the perception of closed contours, which are 
presumably important for detecting visual targets based 
on shape or outline. Although studies show that human 
subjects can rapidly distinguish between images contain- 
ing target and non-target object categories using only 
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the line drawings obtained by filtering natural scenes [T], 
normal experience involves a number of complementary 
visual cues, such as texture, color, motion and stereop- 
sis. Presumably, cortical association fields also act to re- 
inforce features representing these complementary visual 
cues as well. Human subjects, for example, can distin- 
guish whether pairs of texture patches were drawn from 
the same natural object or two different natural objects 
in a manner that exhibits a similar dependence on pair- 
wise co-occurrence statistics as was found for orientated 
edges [SS]. We may speculate that an analysis of coacti- 
vation statistics for features selective to a combination of 
cues such as local orientation, texture, color, motion, and 
disparity may lead to a more general and more powerful 
set of kernels capable of fast and effective determination 
of global object properties, which in turn can play an 
important role in complex object identification. 

Methods 

Synthetic amoeba/no-amoeba image set 

An amoeba is a type of radial frequency pattern [361 
consisting of a deformed circle in which the radius varies 
as a function of the polar angle. By choosing the number 
and relative amplitudes of the different frequency com- 
ponents, the radius can describe an arbitrarily complex 
shape, exactly analogous to how a Fourier basis can be 
used to construct an arbitrary waveform on a finite in- 
terval. Each radial frequency component was represented 
by a sinusoidal function defined at C = 1024 discrete po- 
lar angles, spaced uniformly on the interval [0, 27r). The 
cutoff radial frequency used in constructing the closed 
contour provided a control parameter for regulating the 
complexity of the resulting figure, which ranged from 
nearly circular, when only the 2 lowest radial frequen- 
cies had non-zero amplitudes, to highly sinusoidal and 
irregular, when the first 8 radial frequencies had non- 
zero amplitudes. All amoeba shapes generated here may 
be considered smooth, in that local curvature was always 
bounded. 

In detail, the radius of an amoeba at each polar angle 
was: 

r(0c) = Ao + Y^ An cos('^+ an^ ■ (2) 

n— 1 ^ ^ 

All amplitudes A„ were initially drawn from normal dis- 
tributions with mean and unit variance. All phases 
a„ were drawn from uniform distributions over the in- 
terval and tt/2. The resulting radial frequency pattern 
was then linearly rescaled so that the maximum radius, 
''max, was equal to a random number drawn from a uni- 
form distribution such that L/4 < r^ax < L/2, where L 
is the linear size of the square image (L = 256 pixels), 
and the minimum radius was given by a second randomly 
chosen value so that rmax/4 < r^i^ < r^^-^jl. Uniform 



pseudo-random numbers were generated by the intrinsic 
MATLAB 7.0 function RAND, or its Octave 3.2.3 equiv- 
alent. 

To facilitate the construction of locally indistinguish- 
able clutter and model contour occlusion in natural im- 
ages, amoeba contours were divided into 16 periodically- 
spaced fragments by removing short sections whose 
lengths varied within a specified range. Specifically, the 
gaps between amoeba fragments varied from 16 to 32 in 
units of discrete polar angle (27r/C). Amoeba contours 
were then broken into fragments by periodically inserting 
16 gaps of variable width ranging from 16 to 32, spaced 
C/16 segments apart. Gaps were deleted from the un- 
derlying contour, so that the polar angle subtended by 
each fragment varied in accordance with the changes in 
preceding gap width. The starting point of the first gap 
was chosen randomly on the interval 1, C/G, so that over 
the entire image set the inserted gaps were distributed 
uniformly around the circle. 

To create clutter fragments, an amoeba was first gen- 
erated using the above procedure. Consecutive amoeba 
fragments were then grouped, with the number of frag- 
ments in each group determined by a Poisson process 
with a mean value of 2 and an upper cutoff of 3. Each 
group of amoeba fragments was then rotated about its 
center of mass through random angles on the interval 
7r/8 to Ttt/B. The resulting clutter consisted of the same 
fragments as the original amoeba but rotated so that 
collectively the rotated fragments no longer supported 
the perception of a closed object. Clutter fragments 
constructed in this manner were thus locally indistin- 
guishable from amoeba fragments. To create clutter in 
both target and distractor images, several amoebas were 
first superimposed at random positions and then groups 
of fragments rotated following the procedure described 
above. All amoebas contained the same total number 
of contour fragments (and therefore the same number of 
gaps) but varied in both maximum diameter and total 
contour length. 

The center of each amoeba was chosen randomly un- 
der the restriction that no contour be allowed to cross 
an image boundary. Specifically, the x-coordinate of the 
amoeba center, xq, was chosen randomly on a restricted 
interval, rmax < 2^0 ^ ^ — ''max, and likewise for the y- 
coordinate, ?/o- When groups of amoeba fragments were 
randomly rotated to make clutter, portions of a contour 
belonging to a clutter fragment would occasionally cross 
an image boundary. In such cases, any out-of-bounds 
portions of a contour were reflected back into the image 
region using mirror boundary conditions. 

Target images always consisted of 1 set of amoeba frag- 
ments and 2 sets of clutter fragments. Distractor images 
consisted of 3 sets of clutter fragments and thus, averaged 
over the entire image set, had the same mean luminance 
and the same variance as the target images. Mask images 
were constructed following a procedure nearly identical 
to that used for constructing distractor images, except 
that mask images consisted of 6 sets of clutter fragments. 



15 



obtained by randomly rotating the 6 original amoeba ob- 
jects used in constructing the corresponding target and 
distractor images. All contour fragments were initially 
represented as a set of points in polar coordinates, cor- 
responding to the radius at each discrete polar angle. 
Points along the contour were then transformed back to 
Cartesian coordinates and rounded to the nearest dis- 
crete pixel value. MATLAB scripts for generating the 
image set used in this study are publicly available at: 
http: //petavision. sourcef orge .net. 
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Human psychophysics 

Human performance was evaluated using two- 
alternative forced choice (2AFC) psychophysical exper- 
iments. There were 5 subjects, all with normal or 
corrected-to-normal vision. One subject only contributed 
data for a portion of the tested SOAs. Each subject was 
seated in a dark room, at an approximate distance of 65 
cm from a 19-inch nominal (36.2 x 27 cm actual size) Hi- 
tachi 751 CRT monitor. Images spanned a viewing angle 
of approximately 7° x 7°. The monitor resolution was 
1024 X 768 pixels and the refresh rate was 100 Hz. The 
display was driven by a dual-core 3.0 GHz Mac Pro, with 
MATLAB 7.6 running Psychtoolbox [55] . 

After a short training period to familiarize the subject 
with the task, one target image and one distractor image 
were shown side by side, followed by a mask intended to 
interrupt cognitive processing of the target and distrac- 
tor images. Two separate sets of experiments were con- 
ducted for each subject. In one set, the SOA was chosen 
randomly from the values 40, 80, 120 ms. For the second 
set of experiments, the SOA was chosen randomly from 
the values 20, 160, 200 ms. The duration of the stimulus 
was always the same as the SOA, and thus both the target 
and distractor images remained visible until mask onset. 
The duration of the mask was always 100 ms. Each sub- 
ject was shown 1200 images divided into 10 blocks of 120 
images, with rest breaks in between blocks (rest break du- 
ration was at the discretion of each subject). The pace of 



the experiment was under the control of the subject, who 
initiated each trial using the space bar. A small temporal 
jitter, chosen uniformly between to 250 ms, was added 
to the interval preceding each trial, to prevent entrain- 
ment. Task conditions, consisting of variations in both 
the SOA and the number of radial frequencies if, were 
randomly interleaved such that each condition occurred 
the same number of times over the course of the entire 
experiment. 

On each trial, subjects indicated which side contained 
the target, using a mouse-driven slider bar to report con- 
fidence (see Figure [5]) . The reported confidence values 
were used to construct receiver operating characteristic 
(ROC) curves, which plot the percentage of true posi- 
tives (or hits) against the percentage of false positives (or 
false alarms), with each true/false positive pair obtained 
by setting a confidence threshold at a different location 
along the slider bar. A correct response was not neces- 
sarily considered a true positive: to generate one point 
on the ROC curve, the reported confidence on each trial 
was measured relative to the current threshold position, 
which could be to either the left or to the right of center. 
Thus, a trial might be labeled as incorrect, even though 
the subject moved the slider bar in the correct direction, 
as long as the threshold level was not exceeded. Specifi- 
cally, whenever the reported confidence fell to the left of 
threshold, the corresponding trial was treated as though 
the subject reported the target as being to the left, even 
if the threshold location had been set to the right of cen- 
ter and the confidence bar had actually been slid to the 
right. Likewise, when the reported confidence fell to the 
right of the current threshold position, the trial was al- 
ways treated as if the subject had reported the target to 
the right, again regardless of how the subject moved the 
slider bar relative to the center position. By choosing 
a range of threshold positions, spanning the full range 
of reported confidence values, a complete ROC curve 
was obtained. Note that as the threshold was moved 
closer to the left edge of the slider bar, the percentage of 
true and false positives both approached minimum val- 
ues, since only trials with very high reported confidence 
could contribute to either the true positive or false posi- 
tive rate (most trials were rejected as either true or false 
negatives). As the threshold position moved closer to 
the center of the confidence slider bar, the percentage of 
true positives increased. Finally, as the threshold was 
moved closer to the right edge of the slider bar, both 
the true positive rate and the percentage of false posi- 
tives approached maximum values. The true positive rate 
averaged over all false positive rates, or the area under 
the ROC curve (AUC), was used as an overall measure 
of subject performance. The AUC is equivalent to the 
probability that a randomly chosen target image will be 
correctly classified relative to a randomly chosen distrac- 
tor image, and thus directly predicts performance on the 
2AFC task. Results for each SOA and for each value of 
K were averaged over 5 subjects. Error bars denote the 
standard deviation over the 5 subjects. 
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Model 



rotation matrix, 



Model cortical association fields were based on differ- 
ences in the coactivation statistics of orientation-selective 
filter elements drawn from target and distractor images. 
Geisler and Perry measured co-occurrence statistics for 
oriented edges in human segmented natural images |25j . 
and found a close correspondence to human judgments 
as to whether pairs of short line fragments were drawn 
from the same or different contours. Thus, we refer to 
the difference in coactivation statistics between target 
object and distractor images as Object-Distractor Dif- 
ference (ODD) kernels. ODD kernels were trained using 
40, 000 target and 40, 000 distractor images, each divided 
into 4 sets of 10, 000 images each, with each set associ- 
ated with a different value oi K = 2, 4, 6, 8. The order in 
which the images were presented had no bearing on the 
final form of the ODD kernel; that is, there was no tem- 
poral component to the training. Training with more 
images did not substantively improve performance, al- 
though small differences were observed in the ODD ker- 
nels trained using a smaller number of images (10,000 
target and 10,000 distractor images). 

Each 256 x 256 pixel training image activated a regular 
array of 256 x 256 retinal elements whose outputs were ei- 
ther or 1, depending on whether the corresponding im- 
age pixel was ON or OFF, respectively. Each retinal unit 
activated a local neighborhood of orientation-selective fil- 
ters, which spanned 8 angles spaced uniformly between 
and TT. To mitigate aliasing effects, the orientation- 
selective filters were rotated by a small, fixed offset, equal 
to 7r/16, relative to the axis of the training images. All 
orientation-selective filters were 7x7 pixels in extent and 
consisted of a central excitatory subunit, represented by 
an elliptical Gaussian with a standard deviation of 7.0 in 
the longest direction and an aspect ratio of 7.0, flanked 
by two inhibitory subunits whose shapes were identical 
to the central excitatory subunit but were offset by ±1.4 
pixels in the direction orthogonal to the preferred axis. 

The weight Wg{xi — 2^2, j/i — 2/2)5 from a retinal element 
at (2:2,2/2) to a filter element at {xi,yi) with dominant 
orientation 9, was given by a sum over excitatory and 
inhibitory subunits: 



Wg{xi - X2,yi - y2) = Wg{ri - ra) 
'1 



Re 



A<^ exp 
'1 



(ri - ra) • i?e V-ifie • (ri - r2) 



exp 



exp 



(ri + / - ra) • Rg'a-^Rg ■ir, + f-r2 



li^i-f- ^2) • Rg'^-'Re • (r-i - / - ra) 



(3) 



where the position vector is given by [2:^,?/,;] and 

the matrix <t = [ J 7 ] describes the shape of the elliptical 
Gaussian subunits for 9 = 0. In Eq. |3j Rg is a unitary 



cos 9 sin 9 
— sin 9 cos 9 



(4) 



and / = [0.0, 1.4] is a translation vector in the direc- 
tion orthogonal to the dominant orientation when 9 = 0. 
The amplitude A was determined empirically so that the 
total integrated strength of all excitatory connections 
made by each retinal unit equaled 20.0 (and thus the 
total strength of all inhibitory connections made by each 
retinal unit equaled —40.0). Mirror boundary conditions 
were used to mitigate edge effects. The retinal input to 
each orientation-selective filter element s{xi,yi,9) was 
then given by 

s{xi,yi,9)= ^ We{xi - X2,yi - y2)I{xi,yi){x2,y2), 

X2-V2 

(5) 

where I(xi,yi) is the 7x7 binary input image patch cen- 
tered on (xi, j/i). The sum is over all pixels (a;2, 2/2) that 
are part of this image patch. The initial output of each 
orientation-selective filter element ZQ(xi^yi,9) was ob- 
tained by comparing the sum of its excitatory and in- 
hibitory retinal input to a fixed threshold of 0.5. Values 
below threshold were set to whereas values above unity 
were set to 1.0. Thus 



z{xi,yi,9) = g{s{xi,yi,9)), 
where the function, 



s < 0.5 
/(s) = <! s 0.5<s<1.0 

1 s > 1.0 



(6) 



(7) 



is an element-wise implementation of these thresholds. 
The responses of all suprathreshold orientation-selective 
filters contributed to the coactivation statistics, with only 
the relative distance, direction, and orientation of filter 
pairs recorded. Because of the threshold condition, only 
the most active orientation-selective filters contributed to 
the coactivation statistics. 

For every suprathreshold filter element extracted from 
the i-th target image, coactivation statistics were accu- 
mulated relative to all surrounding suprathreshold filter 
elements extracted from the same image. Thus the ODD 
kernel G is given by 



G 



,{p{x-xo,y-yo),(t)go{x-xo,y-yo)) = ^zo{x,y,9), 



(8) 

where the radial distance p is a function of the (x, y) 
coordinates of the two filter elements, the direction is 
the angle measured relative to 9o, the sum is over all 
suprathreshold elements within a cutoff radius of 32, the 
superscript tt denotes the z-th target image, and the dif- 
ference in the orientations of the two filter elements 9 — 9o 
is taken modulo tt. Because the amoeba/no-amoeba im- 
age set was translationally invariant and isotropic, the 
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central filter element may without loss of generality be 
shifted and rotated to a canonical position and orienta- 
tion, so that the dependence on po, (/>(), 9q may be omitted. 
The coactivation statistics for the «-th target image can 
then be written simply as Gg (p, 4>) , where {p, 4>) gives 
the distance and direction from the origin to the filter el- 
ement with orientation 6, given that the filter element at 
the origin has orientation 7r/16. An analogous expression 
gives the coactivation statistics for the j-th distractor im- 
age G'g' (p, (j)). The ODD kernel Gei^p, (j)) is given by the 
difference 

Gg (p, ^)^W+Y^ G'i (p, cj>)-W^Y. '^e (p. , (9) 

i j 

where the sums are taken over all target and distractor 
images and the normalization factors W+ and W- are de- 
termined empirically so as to yield a total ODD strength 
of 325 (see Figure|8]and Results), defined as the sum over 
all ODD kernel elements arising from either the target or 
distractor components. By construction, the sum over 
all ODD kernel elements equals zero, so that the average 
lateral support for randomly distributed edge fragments 
would be neutral. Our results did not depend critically 
on the RMS magnitude of the ODD kernel (see Figure [s]). 
To minimize storage requirements individual connection 
strengths were stored as unsigned 8-bit integers, so that 
the results of the present study did not depend on com- 
putation of high precision kernels. 

As described above, the canonical ODD kernel is de- 
fined relative to filter elements at the origin with orienta- 
tion 7r/16. Filter elements located away from the origin 
can be accounted for by a trivial translation. To account 
for filter elements with different orientations, separate 
ODD kernels were computed for all 8 orientations then 
rotated to a common orientation and averaged to pro- 
duce a canonical ODD kernel. The canonical kernel was 
then rotated in steps between and tt (offset by 7r/16) 
and then interpolated to Cartesian x—y axes by rounding 
to the nearest integer coordinates. Although it has been 
demonstrated that global contour saliency is enhanced 
for orientations along the cardinal axes |58j . this bias is 
by construction absent from this model. 

ODD kernels were used to compute lateral support for 



each orientation-selective filter element, via linear convo- 
lution. The output of each filter element was then modu- 
lated in a multiplicative fashion by the computed lateral 
support. The procedure was iterated by calculating new 
values for the lateral support s, which were again used 
to modulate filter outputs in a multiplicative fashion: 

Sk{x,y,0) ^ Zk-i{x,y,0) ^ Gg{p,(f)B)zk-i{x ,y' ,9'), 

x' ,y' ,0' 

(10) 

where the subscript k denotes the fc-th iteration. The 
same kernel was used for all iterations. All source code 
used to train and apply cortical association fields is pub- 
licly available at 

http: // sourceforge .net/pro jects/petavision/. 

To measure model performance, in each trial 1 target 
image and 1 distractor image were tested as a pair, so 
as to emulate the 2AFC format of the human experi- 
ments. The orientation-selective filter responses to both 
test images were evaluated after A: = 0, 1, 2, 3, 4 iterations 
of the ODD kernel. The total activation across all filter 
elements, T = J2xye' ^k{x,y,0'), was used to compare 
the two test images. Since the model cortical association 
fields tended to support contour fragments belonging to 
amoebas while inhibiting clutter fragments, the image 
with higher total activation T was assumed to be the 
target image. Error bars for the model performance (as 
shown in Figure [t]) were estimated using the standard 
deviation of a binomial distribution with probability p 
equal to percent correct and N equal to the number of 
trials. 
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